Improved Policy Networks for Computer Go

نویسنده

  • Tristan Cazenave
چکیده

Golois uses residual policy networks to play Go. Two improvements to these residual policy networks are proposed and tested. The first one is to use three output planes. The second one is to add Spatial Batch Normalization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mproved a Rchitectures for C Omputer

AlphaGo trains policy networks with both supervised and reinforcement learning and makes different policy networks play millions of games so as to train a value network. The reinforcement learning part requires massive amount of computation. We propose to train networks for computer Go so that given accuracy is reached with much less examples. We modify the architecture of the networks in order...

متن کامل

A Two-Threshold Guard Channel Scheme for Minimizing Blocking Probability in Communication Networks

In this paper, we consider the call admission problem in cellular network with two classes of voice users. In the first part of paper, we introduce a two-threshold guard channel policy and study its limiting behavior under the stationary traffic. Then we give an algorithm for finding the optimal number of guard channels. In the second part of this paper, we give an algorithm, which minimizes th...

متن کامل

GOjen: tdGo Temporal Difference Learning of Go Playing Artificial Neural Networks

The original project description has been: An existing Java application handling and visualizing Go games between human and computer players (including trained and evolved ANNs) should be improved and extended with Go playing ANNs trained by temporal difference learning. This extension should serve as a basis for comparisons of td learning with conventional ANN training and evolutionary methods...

متن کامل

An improved particle swarm optimization with a new swap operator for team formation problem

Formation of effective teams of experts has played a crucial role in successful projects especially in social networks. In this paper, a new particle swarm optimization (PSO) algorithm is proposed for solving a team formation optimization problem by minimizing the communication cost among experts. The proposed algorithm is called by improved particle optimization with new swap operator (IPSONSO...

متن کامل

Accessibility Evaluation in Biometric Hybrid Architecture for Protecting Social Networks Using Colored Petri Nets

In the last few decades, technological progress has been made important information systems that require high security, Use safe and efficient methods for protecting their privacy. It is a major challenge to Protecting vital data and the ability to threaten attackers. And this has made it important and necessary to be sensitive to the authentication and identify of individuals in confidential n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017